Targeted viromes and total metagenomes capture distinct components of bee gut phage communities

Background Despite being among the most abundant biological entities on earth, bacteriophage (phage) remain an understudied component of host-associated systems. One limitation to studying host-associated phage is the lack of consensus on methods for sampling phage communities. Here, we compare paired total metagenomes and viral size fraction metagenomes (viromes) as methods for investigating the dsDNA viral communities associated with the GI tract of two bee species: the European honey bee Apis mellifera and the eastern bumble bee Bombus impatiens. Results We find that viromes successfully enriched for phage, thereby increasing phage recovery, but only in honey bees. In contrast, for bumble bees, total metagenomes recovered greater phage diversity. Across both bee species, viromes better sampled low occupancy phage, while total metagenomes were biased towards sampling temperate phage. Additionally, many of the phage captured by total metagenomes were absent altogether from viromes. Comparing between bees, we show that phage communities in commercially reared bumble bees are significantly reduced in diversity compared to honey bees, likely reflecting differences in bacterial titer and diversity. In a broader context, these results highlight the complementary nature of total metagenomes and targeted viromes, especially when applied to host-associated environments. Conclusions Overall, we suggest that studies interested in assessing total communities of host-associated phage should consider using both approaches. However, given the constraints of virome sampling, total metagenomes may serve to sample phage communities with the understanding that they will preferentially sample dominant and temperate phage. Video Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40168-024-01875-0.


Background
Bacteriophage (phage) are the viruses which target bacteria.Phage are hypothesized to be among the most abundant biological entities on earth [1][2][3] and to be important modulators of microbial communities [2,4].In marine and soil environments, where these prokaryotic viruses are relatively well studied, phage shape the microbiomes they infect by lysing dominant bacteria [5][6][7], influencing nutrient turnover and cycling [8,9], and vectoring functional genes between bacterial hosts [10][11][12][13].While the basis for much of our understanding of phage ecology stems from the study of marine and soil environments, there is a growing interest in understanding the role of phage in host-associated systems [14,15].However, host-associated environments and free-living soil and marine environments represent different microbial ecosystems with different selective pressures.As such, insights gained from studying free-living phage communities may not always apply to communities of hostassociated phage.Specifically, while the best methods for sampling free-living phage communities have been empirically compared [16,17], little work has evaluated the best methods for sampling host-associated phage [18].Moreover, given the importance of microbiomes to host health, understanding phage community dynamics and how phage differ between related animal hosts is a key priority.
Like their environmental counterparts, phage infecting host-associated bacteria (hereafter "host-associated" phage) can affect the abundance, composition, and function of host-derived microbial communities.However, unlike free-living phage, host-associated phage must also adapt to their close association with a host and its microbiome.For example, the comparatively high-density bacterial communities associated with the human gut [19,20], mucosal layer of corals [21], and other host environments are suggested to favor integrase encoding temperate phage, as opposed to obligately lytic phage [22][23][24].Phage targeting bacteria adhered to gut mucosa have also been shown to evolve the ability to adhere to and persist in the animal mucosa where their target bacteria reside [25].Given the fundamental differences in the ecology of free-living and host-associated phage communities [22], it remains unclear whether sampling methods developed for free-living communities are also appropriate for other habitat types.
Two methods are frequently used for describing dsDNA viral communities: bioinformatic mining of total metagenomes and targeted viral enrichments (viromes).In bioinformatic mining of total metagenomes, total genomic DNA is extracted from a sample, amplified when necessary, sequenced using "shotgun" or untargeted methods, assembled into metagenomes, and phage computationally mined [26,27].While this approach is easy to perform and offers simultaneous characterization of all DNA in a sample (including bacteria), only a small minority of the total metagenomic data generated will originate from phage, producing a relatively shallow sampling of phage communities.An alternative method, targeted virome sequencing, leverages the physical characteristics of viruses to select for the phage and virus-like particle fraction of a sample prior to nucleic acid extraction and sequencing [28][29][30].With more sequencing space devoted to viruses, targeted viromes can recover a greater diversity of phage relative to total metagenomes.This is illustrated by previous comparisons of viromes and total metagenomes in soil and freshwater samples [16,17].However, whether viromes always outperform total metagenomes is not clear.Work applying these two methods to host-associated (human gut) and low-biomass (deep-sea marine) samples has produced conflicting results [17,18].One explanation for these discrepancies is the biases associated with total metagenomes and targeted viromes [31,32].For example, viromes remove bacterial cells prior to sequencing and, as a result, may select against integrated, non-replicating, temperate phage [33,34].This is especially relevant in host-associated systems where temperate phage are abundant [33].Total metagenomes can also be preferable to viromes logistically.Viromes typically demand a relatively large amount of biomass to generate enough nucleic acids for sequencing.While protocols, such as the NetoVIR protocol [30], have successfully been used to circumvent this limitation, it is important to note that the random amplification used in these approaches can introduce another form of bias.Moreover, the pooling of sample material from multiple individuals used in this and other approaches can obscure inter-individual variation in viral communities.
Insects have become valuable models for exploring host-microbe interactions [35][36][37].Social bees in particular house a simple (5-9 taxa), highly conserved, socially transmitted, gut bacterial community [38][39][40][41] which contributes to host nutrient acquisition and pathogen defense [42][43][44].These features position social bees as an excellent model for studying not only host-microbe interactions [45,46] but tripartite host-microbe-phage interactions as well [47].A series of recent studies using targeted virome approaches have shown that honey bees host a diversity of novel phage which target core bee gut bacteria [48][49][50].While this work has advanced our understanding of bee phage, no work has evaluated how sampling method influences recovered bee phage communities.Given the large biomass required for viromes, total metagenomes may be advantageous for surveying the phage communities associated with bees and other biomass limited systems.Additionally, describing the phage associated with other social bees will develop our understanding of how phage communities differ between related hosts.Lastly, broader work comparing the role and diversity of phage among small invertebrate hosts will hinge on appropriate sampling methods, so comparing the performance of viromes and total metagenomes is particularly important for enabling future comparative study.
Here, we evaluate the phage communities inferred by applying total metagenomes and targeted viromes to managed Apis mellifera and commercially raised Bombus impatiens gut material.We hypothesize that targeted viromes will enrich for phage sequences and capture a greater diversity of phage, relative to bioinformatic mining of total metagenomes, and that phage will differ between host bee species.To interrogate these hypotheses, we first examine how host bee species and sampling method impact sequencing read data and viral enrichment.We then validate our sampling and computational methodology through rarefaction and by comparing our phage sequences to those previously described in honey bees.Next, to test how phage community differs across host bee and sampling methods, we compare phage community diversity, structure, and composition.We then use gene content, occupancy-abundance plots, qPCR, and bacterial community profiling to delve into what contributes to the apparent differences in phage communities.Overall, we find that honey bees host more phage than do bumble bees, but that the method which captured the most phage differed by bee species.This suggests that viromes and total metagenomes are each biased in their own way and appear to sample different populations of phage.As a result, we propose these techniques are complementary in describing the full diversity of host-associated phage.

Bee rearing, sampling, and experimental design
A detailed description of our methods (including sample collection and bench and computational work) is provided in the supplemental materials.Briefly, bees were harvested from three colonies of managed honey bees (Apis mellifera) and three colonies of commercially reared bumble bees (Bombus impatiens, Koppert Biological Systems; Howell, MI, USA).Bumble bee colonies were maintained in the laboratory and provided with honey bee collected pollen and artificial nectar (Koppert Biological Systems) ad libitum, as per manufacturer's recommendation.Honey bees were sampled from colonies maintained at the University of California, Davis, apiary and were collected from inside the colony using a handheld bee vacuum between January 30 and February 13, 2023, 8:00 am-10:00 am.
Bees were anesthetized via a 60-s CO 2 exposure and then euthanized via decapitation.Immediately following sacrifice, the mid-hindgut section was dissected from 100 bees (all collected from the same colony) and pooled for same-day virome extraction.Bees destined for total metagenomic DNA extraction were collected the same day as virome samples and were stored at −20 °C pending mid-hindgut dissection on the following day.All dissections took place in sterile PBS using ethanol and flame sterilized forceps.Each honey and bumble bee colony generated one targeted virome and three total metagenomes (Fig. 1).

Sample preparation and DNA extraction
Phage enrichments for targeted virome sequencing were carried out following a protocol adapted from previous publications [28,29,51] and is described fully Fig. 1 Graphical representation of the sampling scheme and methods used in this research.A total of three bumble bee and three honey bee colonies were sampled.From each colony of bee, we generated three total metagenomes and one targeted virome.Total metagenomes were sampled from individual bees, while targeted viromes were produced from the pooled guts of 100 bees.This sampling resulted in nine bumble bee total metagenomes (light blue), three bumble bee viromes (dark blue), nine honey bee total metagenomes (light orange), and three honey bee viromes (dark orange) in the supplemental methods.Briefly, bee gut pools were combined with 10 ml of protein-supplemented phosphate-buffered saline (PPBS: 2% BSA, 10% PBS, 1% K-citrate, 150-mM MgSO4), homogenized, and submitted to a series of washes aimed at eluting phage particles.Phage were enriched via centrifugation, 0.22-µm filtration, and ultracentrifugation.Lastly, phage pellets were resuspended in 200-µL water and DNase treated.
For total metagenomes, mid-hind guts of individual bees were dissected and homogenized and used for extraction.DNA was extracted from both virome and total metagenome samples using a DNeasy PowerSoil Pro kit (Qiagen, Hilden, Germany) following the manufacturer's instructions with the optional Qiagen Vortex Adapter 15-min bead beating step.

Library construction and DNA sequencing
Extracted DNA from samples and kit blank-negative controls was submitted to the University of California Davis Genome Center for library prep and paired-end 150-bp shotgun metagenomic sequencing.Libraries were prepared using a KAPA HyperPrep Kit (Kapa Biosystems, Roche, Basel, Switzerland) with Illumina TruSeq adapters (Illumina, San Diego, CA, USA) and a target insert size of 350 bp.Sequencing took place on an Illumina NovaSeq.All samples were sequenced on the same run.

Read processing, viral prediction, and vOTU table construction
Briefly, all demultiplexed .fastq.gz files from the Davis Genome Center were trimmed and quality filtered using Trim-Galore [52] and Trimmomatic [53].Reads aligning to negative samples, or the genomes of A. mellifera (GCA_000002195.1)and B. impatiens (GCF_000188095.3),were removed using Bowtie2 [54].Cleaned reads were k-merized for complexity comparisons using Sourmash [55].The program metaSPADES was used to construct assemblies.Assemblies ≥ 5 kb were passed to geNomad [56] for phage prediction.Putative phage sequences predicted to be < 50% complete by CheckV [57] were dropped.Remaining sequences ≥ 95% average nucleotide identity were clustered into viral operational taxonomic units (vOTUs) using dRep [58] and annotated with Prokka [59] using the PHROGs database [60].The programs BACPHLIP [61] and geNomad were then used to predict the lifestyle and taxonomy of identified vOTUs, respectively.Finally, CoverM [62] was used to map cleaned reads against vOTUs.The resulting coverage table was normalized to coverage per million reads.

Statistical and ecological analyses
Statistical analyses were conducted in R [66] v4.2.3.Briefly, alpha diversity, beta diversity, and PERMANO-VAs were compared between host and sampling method using vegan [67] and phyloseq [68].Linear mixed effect models were built using lme4 [69], and post hoc tests were run using emmeans [70].T-tests were performed using base R. Gene-sharing networks were generated outside of R using the program vConTACT2 [71,72].Genome alignment plots were constructed using clinker [73].
Phage-host predictions were performed using a custom CRISPR-spacer analysis.In short, we built bacterial metagenome-assembled genomes (MAGs) from our total metagenomic dataset using metaSPADES, MetaBat2 [74], and dRep.We then combined these MAGs with NCBI assemblies of common bee gut bacteria, mined all these sequences for CRISPRs using MINCED [75], and then aligned those CRISPRs to our vOTUs using Blastn [76] following the parameters suggested by Edwards et al. [77] Lastly, to improve the number of vOTUs with bacterial host and viral taxonomic assignments, we combined our CRISPR-based phage-host assignments and geNomad taxonomic predictions with the clusters produced by vConTACT2 to infer the bacterial hosts and viral taxonomies of whole phage clusters, similar to Bonilla-Rosso et al. [48] Bacterial 16S copy number was compared between bee species using linear mixed effect models.

Bumble bee viromes produce fewer high-quality reads than honey bee viromes
Our sequencing effort generated a total of 419,884,845 raw read pairs, 2.47% (10,385,967) of which came from extraction negatives.While the number of raw read pairs did not differ significantly between samples (Fig. S1; Bee t 5.40 = 0.54, p = 0.61, Type t 8.47e23 = 1.58, p = 0.11, Bee:Type t 1.07e23 = −1.38,p > 0.17), there were differences in the number of high-quality read pairs remaining after quality control (Bee t 5.81 = 2.87, p < 0.05, Type t 2.75e25 = −0.33,p = 0.74, Bee:Type t 2.75e25 = 4.69, p < 0.001) (Fig. 2A).After filtering, honey bee viromes averaged 15.56 million high-quality read pairs per sample, while bumble bee viromes averaged only 5.49.Total metagenome samples generated more similar numbers of high-quality reads per sample, with an average of 8.57 and 5.85 million read pairs generated from honey and bumble bee samples, respectively.

Bumble bee virome reads are less complex than other samples
Next, we measured the complexity of read libraries to further investigate how bee species and sampling method influenced the data we generated.This was done by comparing the number and occurrence of 31-bp-sized k-mers among each set of high-quality read libraries (Fig. 2B).The results of this analysis suggest that across the range of k-mer occurrences, bumble bee viromes consistently contain fewer k-mers of a given occurrence than did other sample types.Moreover, viewing only the y-intercept of this plot (Fig. 2C) shows that bumble bee viromes contain a lower number of singleton k-mers, relative to all other sample types (Bee t 5.34 = 2.49, p = 0.052, Type t 16 = −6.47,p < 0.001, Bee:Type t 16 = 5.14, p < 0.001).This decreased number of singleton k-mers and the tendency towards lower k-mer occurrences suggest that bumble bee viromes are less complex compared to the other sample types in this dataset.

Targeted virome assemblies are enriched in phage
To evaluate the size and number of assemblies produced from each read library, and to quantify the extent to which viromes enriched for phage sequences, we visualized the distribution of assembly sizes in each sample, as well as the percentage of assemblies in a sample predicted to encode phage sequences (Fig. 2D).A total of 4278 and 37 contigs ≥ 5 kb were assembled from honey and bumble bee viromes, respectively.After passing to geNomad, these contigs yielded 2883 honey bee and 32 bumble bee putative phage sequences.Meanwhile, honey and bumble bee total metagenomes produced 16,615 and 14,335 contigs ≥ 5 kb, respectively, which yielded 662 and 276 putative phage sequences.This means that 65.74% of honey bee and 75.82% of bumble bee virome contigs ≥ 5 kb were predicted to be viral, while only 4.06% and 3.43% of honey and bumble bee total metagenome contigs ≥ 5 kb were annotated as viral.The enrichment of phage sequences in viromes, relative to total metagenomes, provides validation that viromes successfully enrich for phage when applied to bee guts.
Next, we used rarefaction to assess if the different number of putative phage sequences recovered by honey and bumble bee viromes (Fig. 2D) was a result of differences in sampling depth (Fig. 2A).This analysis shows that even at equal sampling depths, honey bee viromes consistently recover approximately 20× the number of phage recovered from bumble bee viromes (Fig. S2A and B).

Phage target core bee bacteria and resemble previously described honey bee phage
After measuring the ability of viromes to enrich for phage sequences, we next sought to characterize the phage communities we detected in terms of similarity to previously described bee phage communities, putative bacterial hosts, and predicted viral taxonomy.To do this, we first removed phage predicted to be low quality (< 50% complete).This reduced the total number of sequences recovered from 3853 to 655.Next, we constructed vOTUs by collapsing the remaining high-quality phage sequences at a 95% average nucleotide identity, further reducing the number of putative phage from 655 to 609.
We then built gene-sharing networks comparing our vOTUs to those previously described by other bee phage studies (Fig. 3A), as well as reference phage taken from vConTACT2's NCBI's RefSeq database (Fig. S3).In both network, many of the vOTUs from the current study clustered with those described by other bee phage studies, suggesting they are related at roughly the genus level [71,72].For example, viral cluster 149 contains sequences recovered by the current study, Bonilla-Rosso et al. [48], Deboutte et al. [49], and Busby et al. [50] (Fig. S4).We interpret these results as further validating our sample preparation and computational methodologies.
Next, we assigned bacterial hosts to our vOTUs.To focus this and all downstream analyses on only the most abundant phage, we filtered vOTUs to retain only those with a normalized coverage ≥ 1, resulting in 477 vOTUs across 24 samples.Overall, we assigned hosts to 51.99% (248) of the 477 vOTUs appearing in our dataset (Fig. 3B).In terms of relative abundance, phage with assigned hosts made up an average of 4.03% and 18.81% of bumble bee viromes and total metagenomes and 60.92% and 65.97% of honey bee viromes and total metagenomes, respectively.Notably, the high proportion of bumble bee phage with unassigned hosts, especially in viromes, is likely confounded by the fact that these samples had an extraordinarily diminished phage diversity.

Bee species differ in phage diversity and composition
Next, we tested if vOTU diversity and community composition differed between bee species and sampling method.While the diversity and richness of communities followed similar trends (Fig. 4A and B), there were substantial quantitative differences between these two metrics.When we compared Shannon's diversity, sampling method and the interaction of bee species and sampling method significantly affected vOTU diversity, while bee species alone did not (Fig. 4A; Bee t 5.93 =2.16 p=0.074;Type t 16 =-2.39p<0.05;Bee:Type t 16 =5.09p<0.001).Meanwhile, only the interaction between bee species and sampling method significantly impacted community richness (Fig. 4B; Bee t 20 = 0.75, p = 0.46; Type t 20 = −0.85,p = 0.41; Bee:Type t 20 = 19.15,p < 0.001).Overall, these results suggest that sampling method can significantly influence the alpha diversity of inferred phage communities, but that this can differ in magnitude and direction according to bee species.
In honey bees, 21.56% (91/422) of all vOTUs were shared between virome and total metagenome samples.Similarly, in bumble bees, 23.21% (13/56) of vOTUs were detected by both sampling methods.Only one vOTUs was detected in both bee species (Fig. 4C).vOTU community composition was predicted primarily by host bee species and secondarily based on sample method (

Bacterial community variation predicts phage community variation
We hypothesized that differences in phage richness and community composition between bee species may be driven by differences in bacterial abundance and composition.First, we used qPCR to compare the number of bacteria found in the guts of honey and bumble bees (Fig. 5A) and found that the average mid-hindgut 16S gene copy number was significantly lower in bumble bees than it was in honey bees (t 9.13 = −3.54,p < 0.01).
We then used Kraken2 and Bracken to estimate the diversity and taxonomic profile of bacterial communities (Fig. 5B, C, D).At the species level, bumble bee bacterial communities hosted lower bacterial richness and greater Shannon's diversity than honey bee bacterial communities (Richness t 13.78 = −6.27,p < 0.001, Nodes with "previously described" as a predicted host are the vOTUs previously described by other bee phage papers [48][49][50].B Stacked bar plot describing the community of phage found in each sample based on predicted host.Both A and B use the same color palette Shannon t 11.70 = −6.27,p< 0.001), suggesting fewer bacterial taxa were present in the bumble bees we sampled, but that bumble bee bacterial communities were more evenly divided among their constituent bacteria.
Lastly, we used a Mantel test to compare the relationship between bacterial and phage community structure (Fig. S5).We found a high degree of similarity between the bacterial and phage communities (R = 0.59, p < 0.001), meaning that individual bees with similar bacterial communities also have similar phage communities.

Total metagenomes capture prophage but miss rare phage
After identifying factors which explain some of the hostspecific differences in phage communities, we sought to evaluate why sampling method (virome vs total metagenome) also led to significant differences in inferred phage communities.For this, we first compared the gene content of vOTUs recovered from viromes and total metagenomes.Because total metagenomes are primarily comprised of bacterial DNA, we hypothesized that this sampling method would enrich for temperate phage.In both honey bees and total metagenomes, temperate Fig. 4 Figures describing the alpha beta diversity of the phage communities identified in our samples.Blue represents samples taken from bumble bees.Orange represents samples taken from honey bees.Lighter colors are total metagenomes.Darker colors are viromes.A and B Boxplots describing Shannon's diversity and vOTU richness associated with each of our sample types.The x-axis group samples.The y-axis shows diversity and richness scores.C Euler diagram describing phage community overlaps between each of our sample types.Numbers correspond to the number of phage present in each section of the graph (i.e., 91 phage were found in both honey bee viromes and total metagenomes).Circle size is proportional to number of phage.C Nonmetric multidimensional scaling (NMDS) ordination describing the Bray-Curtis dissimilarity of all samples on our dataset.Individual points are samples.Color represents sample type.The closer 2 points are to each other, the more similar they are in phage community phage comprised a greater proportion of the entire vOTU community, relative to bumble bees and viromes (Fig. S6; Bee t20 = 3.88, p < 0.001, Type t 20 = −2.32p<0.05,Bee:Type t 20 = 0.31, p = 0.76).In terms of relative abundance, integrase encoding phage were more abundant in total metagenomes than they were in viromes (Fig. 6A; Bee t 17.36 = −1.56,p = 0.14; Type t 16 = 3.80, p < 0.01; Bee:Type Type t 16 = −1.53,p = 0.12).Overall, these results suggest that, in this system, total metagenomes are biased towards sampling temperate phage.
Lastly, to examine how sampling methods differ in recovery of rare vs abundant phage, we visualized the occupancy-abundance relationships of vOTUs vs vOTUs recovered by both methods.This analysis revealed that while there is little to no difference between the relative abundance of vOTUs identified in viromes vs total metagenomes (Fig. 6B), viromes are better able to capture vOTUs with low occupancy (Fig. 6C and D).

Discussion
Although phage are integral members of complex microbiomes, methods for describing phage communities are still being developed.Here, we compare how two common forms of phage community sampling, bioinformatic mining of total metagenomes and targeted sequencing of viromes, influence the phage communities recovered from the guts of two commercially important bee species: Apis mellifera and Bombus impatiens.Similar to the results of previous studies [16][17][18], we show that sampling method significantly affected alpha diversity, beta diversity, composition, and structure of sampled phage communities.In particular, we show that viromes outperform total metagenomes in terms of the number Fig. 5 Honey and bumble bees differ in bacterial community density, diversity, and composition.A Boxplots displaying the 16S rRNA gene copy number present in the mid-hindgut region of sampled honey and bumble bees.B Boxplot presenting the number of observed bacterial species in honey and bumble bee.C Boxplot showing the species level Shannon's diversity of honey and bumble bee gut bacterial communities.D Stacked bar plot describing honey and bumble bee bacterial communities at the genus level of phages recovered, but that this can depend on the specific environment being sampled.Overall, viromes were better at sampling rare and less prevalent phage when bacterial and phage diversity was high, while total metagenomes captured a greater diversity of phage in low biomass samples.We also show that regardless of sample biomass, total metagenomes were enriched for temperate phage, compared to viromes, and were consistently able to recover phage not found in viromes.Given these results, we suggest that viromes and total metagenomes each have limitations, are complementary, and that choice of one method over the other likely depends on the environment being sampled.

Honey bees host a core phage community
Viewing our results within the broader context of bee phage, our analyses show that many features of the phage communities identified here resemble those previously described in honey bees sampled from North America and Europe [48][49][50].For example, Herelleviridae (formerly Myoviridae [78]) and Rountreeviridae (formerly Podoviridae [79]) were among the most abundant viral families that we and these previous studies identified.Similarly, Lactobacillus, Gilliamella, Bifidobacterium, and other core bee gut bacteria, were among the most predicted phage hosts in our and previous studies.Interestingly, four of the viruses identified in our study were predicted to belong to Totiviridae, a family of viruses known to target yeasts and other eukaryotes [80].Given the importance of yeasts in maintaining bee health [81], especially in bumble bees [82], this suggests that yeast targeting viruses may be an important member of bee gut microbial communities.Notably, while previous bee phage studies found a number of phage predicted to target Bartonella, none of our phage was predicted to target this bacterial genus.This is likely because no available Bartonella apis assemblies met the strict criteria for inclusion in our CRISPR spacer analysis.
Delving deeper, our gene-sharing network and clinker analyses show not only that many of our phage share large genomic regions with previously described honey bee phage but also that gene order in these geographically disparate viral genomes is also conserved (Fig. S4).This conservation between honey bee phage communities is similar to what was observed by Busby et al. [50].Likewise, Rosso et al. [48] showed that the genomes of honey bee-associated phage sampled from Europe are able to recruit reads from Japanese honey bee total metagenomic datasets.
Altogether, these findings support the idea that similar to the highly conserved nature of some bee gut bacterial communities [38,39,46,83], some features of bee gut phage communities are also conserved between bees sampled across geographically disparate populations.Future work should focus on identifying what features of phage communities, whether individual genes, clusters of genes, or perhaps whole genomes, are conserved between different populations of bees.

Host bacterial communities predict phage communities
When compared to free-foraging managed honey bees, the commercially produced bumble bees we sampled hosted significantly lower diversity phage communities.Although differences in species biology and colony size may play a role, we attribute this low phage diversity primarily to the low density and diversity bacterial communities these bumble bees hosted, suggesting that phage diversity and abundance may track bacterial diversity and abundance across hosts.However, whether such results are generalizable to wild bumble bees is unclear.In our study, bacterial 16S rRNA copy number in bumble bee guts ranged from 10 6 to 10 7 , whereas previous work in the same species reported 10 8 -10 9 copies of the 16S rRNA gene [38,[84][85][86].Given we used standard rearing methods and diet, we hypothesize that lower bacterial titer may be due to the age of bees we sampled.Previous work in bumble bees shows that bacterial density increases with worker age, saturating approximately 4 days after worker emergence [84].Future work could examine how phage abundance and composition changes through worker development and if this reflects previous observations of bacterial succession with worker age [17].More generally, these findings suggest that features of host bacterial communities (i.e., density, diversity, and structure) may be used to predict phage community features.This is further supported by our Mantel analysis showing a correlation between bacterial and phage community dissimilarity.

Viromes and total metagenomes reveal complementary phage communities
There are several possible reasons as to why the phage communities inferred by viromes and total metagenomes differ.One explanation is the way we generated viromes and how this influenced the specific population of phage sampled.Research by Hoyles et al. [87] has shown that passing human fecal samples through a 0.22-µm filter reduces the number of phage recovered by nearly half.As such, size filtration may have led some phage which are present in our bee guts to have been excluded from viromes.This size exclusion may explain why we, and previous virome vs total metagenome studies [16][17][18], consistently show that some phage are only found in total metagenomes.Other potential explanations include sheer random chance (two independent sampling events each recovering unique elements) or sample processing (total metagenome samples being frozen prior to DNA extraction).While random chance cannot be fully ruled out, given that total metagenomes did not receive a DNase treatment prior to sequencing, disruption of phage capsid proteins by freezing likely did not have a substantial impact on inferred phage communities.Instead, the particular types of phage which are excluded by size filtration likely further explain the total metagenome and virome differences we observed.We found that putative temperate phage were more abundant and prevalent in total metagenomes than they were in viromes.This agrees with existing literature which suggests that size filtration can specifically remove integrated lysogenic phage, large jumbo phage, and phage adhered to bacterial cells and particles [32,34,88,89].While it is tempting to expand these results to different environments by stating total metagenomes always sample temperate phage missed by viromes, work in other systems implies a more nuanced reality.In agricultural soils, Medellin et al. [16] identified only three phages that were present in total metagenomes and absent from viromes.This suggests that the degree to which sample processing skews viromes differs by environment.As a result, we suggest that the relative utility of viromes and total metagenomes likely depends on the environment being sampled.Samples with a high abundance of temperate phage, such as host-associated systems and low-biomass environments, may benefit from total metagenomes.This is similar to the conclusions made by Kosmopoulos et al. [17] which suggest that the choice of viromes vs total metagenomes should be environment specific.
While our results highlight the ability of total metagenomes to recover phage missed by viromes, they also showcase the capacity of viromes to sample a greater diversity of phage.Similar to previous comparisons of viromes and total metagenomes in human gut, soil, and freshwater environments [17,18,29], our honey bee viromes recovered a substantially larger number of phage than did total metagenomes.Further, the occupancy-abundance relationships examined here and by Medellin et al. [16] show that total metagenomes tend to be biased towards sampling the most abundant and prevalent phage, while viromes are more successful at sampling rare phage.Taken together, these results support previous work documenting that viromes produce a deeper sampling of phage communities compared to total metagenomes [32,90,91].

Conclusions
By comparing bioinformatic mining of total metagenomes and targeted viromes across two bee species, we found that sampling method significantly affected inferred phage communities, but that the directionality of these differences can depend on the host being sampled.In general, total metagenomes tended to be biased towards sampling putatively temperate phage.In samples with a relatively high biomass (e.g., honey bees), viromes produced a greater diversity of phage and a better sampling of rare phage.In contrast, when applied to relatively low biomass samples (e.g., bumble bees), total metagenomes captured a greater diversity of phage than did viromes.Regardless of sample biomass, there were always phage unique to both viromes and total metagenomes.In conclusion, we suggest that these methods are complementary and recommend that both be used to capture the full diversity of phage present in a gut environment.However, given that virome sampling is not always feasible (i.e., in the case of field collected insects), total metagenomes may serve to sample phage communities with the caveat that they will preferentially sample dominant and temperate phage.

Fig. 2
Fig. 2 Figures describing sequencing and assembly quality.Color denotes sample groups.A Boxplots describing sequencing depth.The x-axis shows sample groups.The y-axis displays the total number of high-quality reads produced.B Line plots describing sequence complexity of the reads shown in A, as measured by the frequency distributions of 31-bp-sized k-mers.The x-axis presents k-mer occurrence (i.e., how abundant a particular k-mer was in a given sample's read), while the y-axis shows the number of k-mers with a certain occurrence.C Boxplots describing the number of unique 31-bp-sized k-mers present in the read libraries of each group of samples.The x-axis shows sample groups.The y-axis shows the number of unique k-mers.D Jittered dot plots describing the length distributions of contigs assembled from each sample.Only contigs > = 1 kbp bp are shown.Individual samples are shown on the y-axis.The x-axis shows contig length (log 10 scale).Each point represents a single contig.A dotted red line is drawn at 5 Kbp.For each sample, a green square is plotted phage enrichment (the number of phage identified, divided by number of contigs assessed for phage, times 100) Figure 4D; Bray-Curtis PERMANOVA Bee R 2 = 0.21, F 1,20 = 7.390, p < 0.001; Method R 2 = 0.11, F 1,20 = 3.93, p < 0.001; Bee:Method R 2 = 0.11, F 1,20 = 3.93, p < 0.001).

Fig. 3
Fig. 3 Figures describing gene sharing among the phage identified in our dataset and the overall phage communities found in individual samples.A Weighted gene-sharing network of all non-singleton vOTUs identified in our study and by previous bee phage studies.Individual nodes are vOTUs.Nodes are connected by edges when vOTUs share genes.Nodes are colored based on the predicted bacterial host of phage.Nodes with "previously described" as a predicted host are the vOTUs previously described by other bee phage papers [48-50].B Stacked bar plot describing the community of phage found in each sample based on predicted host.Both A and B use the same color palette

Fig. 6
Fig. 6 vOTUs sampled by viromes and total metagenomes differ in integrase content, occupancy, and abundance.A Stacked bar plot showing the relative abundance of temperate and lytic vOTUs in virome and total metagenome samples.B Density distributions showing the average relative abundance of all vOTUs found in viromes.vOTUs unique to viromes are shown in blue.vOTUs found in both viromes and total metagenomes are shown in red.C Dot plot showing the occupancy-abundance relationship of all virome vOTUs.The x-axis describes average relative abundance, while the y-axis shows the percent occupancy of each vOTU.As before, vOTUs unique to viromes are shown in blue, and those found in both viromes and total metagenomes are shown in red.D Stacked bar plot describing the percent of vOTUs found in one, two, or three samples.The x-axis represents the percent of vOTUs.The y-axis the occupancy.Similar to B and C, vOTUs unique to viromes are shown in blue, and those found in both viromes and total metagenomes are shown in red